Many colleges want to optimize the money they receive from their alumni. In order to do so, they need to identify and predict the salary/unemployment rate of recent graduates based on their education and other various factors. Doing so, they will be able to put more money into those programs to get a larger return on their investments (students).
Business Question:
Where can colleges put money in order to optimize the amount of money they receive from recent graduates?
Analysis Question:
Based on recent graduates and their characteristics/education, what would be their predicted median salary? Would they make over or less than six figures?
This data is pulled from the 2012-12 American Community Survey Public Use Microdata Series, and is limited to those users under the age of 28. The general purpose of this code and data is based upon the story [linked phrase] (https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/)
What will we be doing? Methods, techniques, why?
A brief look at the raw data can be found below.
## 'data.frame': 172 obs. of 21 variables:
## $ Rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Major_code : int 2419 2416 2415 2417 2405 2418 6202 5001 2414 2408 ...
## $ Major : chr "PETROLEUM ENGINEERING" "MINING AND MINERAL ENGINEERING" "METALLURGICAL ENGINEERING" "NAVAL ARCHITECTURE AND MARINE ENGINEERING" ...
## $ Total : int 2339 756 856 1258 32260 2573 3777 1792 91227 81527 ...
## $ Men : int 2057 679 725 1123 21239 2200 2110 832 80320 65511 ...
## $ Women : int 282 77 131 135 11021 373 1667 960 10907 16016 ...
## $ Major_category : chr "Engineering" "Engineering" "Engineering" "Engineering" ...
## $ ShareWomen : num 0.121 0.102 0.153 0.107 0.342 ...
## $ Sample_size : int 36 7 3 16 289 17 51 10 1029 631 ...
## $ Employed : int 1976 640 648 758 25694 1857 2912 1526 76442 61928 ...
## $ Full_time : int 1849 556 558 1069 23170 2038 2924 1085 71298 55450 ...
## $ Part_time : int 270 170 133 150 5180 264 296 553 13101 12695 ...
## $ Full_time_year_round: int 1207 388 340 692 16697 1449 2482 827 54639 41413 ...
## $ Unemployed : int 37 85 16 40 1672 400 308 33 4650 3895 ...
## $ Unemployment_rate : num 0.0184 0.1172 0.0241 0.0501 0.0611 ...
## $ Median : int 110000 75000 73000 70000 65000 65000 62000 62000 60000 60000 ...
## $ P25th : int 95000 55000 50000 43000 50000 50000 53000 31500 48000 45000 ...
## $ P75th : int 125000 90000 105000 80000 75000 102000 72000 109000 70000 72000 ...
## $ College_jobs : int 1534 350 456 529 18314 1142 1768 972 52844 45829 ...
## $ Non_college_jobs : int 364 257 176 102 4440 657 314 500 16384 10874 ...
## $ Low_wage_jobs : int 193 50 0 0 972 244 259 220 3253 3170 ...
## - attr(*, "na.action")= 'omit' Named int 22
## ..- attr(*, "names")= chr "22"
As can be seen above, many of the categories are integer values. Many of these variables can be converted into factor variables in addition to the numerical ones. In addition, the variables Rank, Major Code, and Major can be dropped as the Rank variable highly correlates with the salary variable, and the other two are to specific and cannot be generalized.
majors_added_categorical <- majors_raw %>% mutate(Over.50K = ifelse(Median > 50000, "Over", "Under.Equal"), High.Unemployment = ifelse(Unemployment_rate > 0.5, "High", "Low")) %>% select(-1, -2, -3)
In addition, the categorical variable categories can be compressed in order for more useful data for the analysis.
##
## Sciences Arts Other STEM
## 54 30 48 40
In order to do some analysis, all categorical variables need to be one hot encoded, which is done below:
# One Hot Encoded Data
majors_onehot <- one_hot(data.table(majors_factors), cols = c("Major_category", "High.Unemployment"))
# Normal Data
majors <- majors_factors
Before beginning with the analytical part of the exploration, it is beneficial to visualize and summarize the data in order to get a better understanding of the data in its entirety, and with an emphasis on variables you believe to be important for your analysis.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 22000 33000 36000 40077 45000 110000
## Total Men Women ShareWomen Sample_size Employed
## Total 1.0000000 0.8780884 0.9447645 0.1429993 0.9455747 0.9962140
## Men 0.8780884 1.0000000 0.6727589 -0.1120136 0.8751756 0.8706047
## Women 0.9447645 0.6727589 1.0000000 0.2978321 0.8626064 0.9440365
## ShareWomen 0.1429993 -0.1120136 0.2978321 1.0000000 0.0974957 0.1475468
## Sample_size 0.9455747 0.8751756 0.8626064 0.0974957 1.0000000 0.9644062
## Full_time Part_time Full_time_year_round Unemployed
## Total 0.9893392 0.9502684 0.9811118 0.9747684
## Men 0.8935631 0.7515917 0.8924540 0.8694115
## Women 0.9176812 0.9545133 0.9057195 0.9116943
## ShareWomen 0.1202001 0.2122898 0.1125230 0.1212430
## Sample_size 0.9783624 0.8245444 0.9852125 0.9179335
## Unemployment_rate Median P25th P75th College_jobs
## Total 0.08319170 -0.1067377 -0.07192608 -0.08319767 0.8004648
## Men 0.10150234 0.0259906 0.03872518 0.05239290 0.5631684
## Women 0.05910776 -0.1828419 -0.13773826 -0.16452834 0.8519460
## ShareWomen 0.07320458 -0.6186898 -0.50019863 -0.58693216 0.1955501
## Sample_size 0.06295494 -0.0644750 -0.02442859 -0.05225614 0.7012309
## Non_college_jobs Low_wage_jobs
## Total 0.9412471 0.9355096
## Men 0.8514998 0.7913360
## Women 0.8721318 0.9044699
## ShareWomen 0.1370066 0.1878496
## Sample_size 0.9153352 0.8601159
## [1] 172 22
## [1] 121 22
## [1] 26 22
## [1] 25 22
## Classes 'data.table' and 'data.frame': 121 obs. of 21 variables:
## $ Total : int 2339 756 856 2573 3777 91227 81527 41542 15058 14955 ...
## $ Men : int 2057 679 725 2200 2110 80320 65511 33258 12953 8407 ...
## $ Women : int 282 77 131 373 1667 10907 16016 8284 2105 6548 ...
## $ Major_category_Sciences: int 0 0 0 0 0 0 0 0 0 0 ...
## $ Major_category_Arts : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Major_category_Other : int 0 0 0 0 1 0 0 0 0 0 ...
## $ Major_category_STEM : int 1 1 1 1 0 1 1 1 1 1 ...
## $ ShareWomen : num 0.121 0.102 0.153 0.145 0.441 ...
## $ Sample_size : int 36 7 3 17 51 1029 631 399 147 79 ...
## $ Employed : int 1976 640 648 1857 2912 76442 61928 32506 11391 10047 ...
## $ Full_time : int 1849 556 558 2038 2924 71298 55450 30315 11106 9017 ...
## $ Part_time : int 270 170 133 264 296 13101 12695 5146 2724 2694 ...
## $ Full_time_year_round : int 1207 388 340 1449 2482 54639 41413 23621 8790 5986 ...
## $ Unemployed : int 37 85 16 400 308 4650 3895 2275 794 1019 ...
## $ Unemployment_rate : num 0.0184 0.1172 0.0241 0.1772 0.0957 ...
## $ P25th : int 95000 55000 50000 50000 53000 48000 45000 45000 42000 36000 ...
## $ P75th : int 125000 90000 105000 102000 72000 70000 72000 75000 70000 70000 ...
## $ College_jobs : int 1534 350 456 1142 1768 52844 45829 23694 8184 6439 ...
## $ Non_college_jobs : int 364 257 176 657 314 16384 10874 5721 2425 2471 ...
## $ Low_wage_jobs : int 193 50 0 244 259 3253 3170 980 372 789 ...
## $ High.Unemployment_Low : int 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, ".internal.selfref")=<externalptr>
## C5.0
##
## 121 samples
## 21 predictor
## 2 classes: 'Over', 'Under.Equal'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times)
## Summary of sample sizes: 109, 109, 108, 110, 109, 110, ...
## Resampling results across tuning parameters:
##
## model winnow trials Accuracy Kappa
## rules FALSE 1 0.9279604 0.6789588
## rules FALSE 10 0.9276224 0.7510563
## rules FALSE 20 0.9292890 0.7585563
## rules TRUE 1 0.9398019 0.7054988
## rules TRUE 10 0.9280070 0.6870446
## rules TRUE 20 0.9280070 0.6870446
## tree FALSE 1 0.9278089 0.6813726
## tree FALSE 10 0.9359557 0.7880563
## tree FALSE 20 0.9327506 0.7786667
## tree TRUE 1 0.9398019 0.7054988
## tree TRUE 10 0.9280070 0.6870446
## tree TRUE 20 0.9280070 0.6870446
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were trials = 1, model = rules and winnow
## = TRUE.
## Confusion Matrix and Statistics
##
## Actual
## Prediction Over Under.Equal
## Over 3 1
## Under.Equal 1 21
##
## Accuracy : 0.9231
## 95% CI : (0.7487, 0.9905)
## No Information Rate : 0.8462
## P-Value [Acc > NIR] : 0.214
##
## Kappa : 0.7045
##
## Mcnemar's Test P-Value : 1.000
##
## Sensitivity : 0.7500
## Specificity : 0.9545
## Pos Pred Value : 0.7500
## Neg Pred Value : 0.9545
## Prevalence : 0.1538
## Detection Rate : 0.1154
## Detection Prevalence : 0.1538
## Balanced Accuracy : 0.8523
##
## 'Positive' Class : Over
##
# Given a certain values for the other variables predict the Median Salary
## C5.0 variable importance
##
## only 20 most important variables shown (out of 21)
##
## Overall
## P75th 100.000
## P25th 81.358
## Major_category_STEM 80.507
## ShareWomen 4.235
## College_jobs 0.000
## Employed 0.000
## Non_college_jobs 0.000
## Full_time 0.000
## Sample_size 0.000
## Unemployment_rate 0.000
## Total 0.000
## Men 0.000
## Part_time 0.000
## Unemployed 0.000
## Major_category_Arts 0.000
## Low_wage_jobs 0.000
## High.Unemployment_Low 0.000
## Major_category_Sciences 0.000
## Women 0.000
## Major_category_Other 0.000
## C5.0
##
## 121 samples
## 21 predictor
## 2 classes: 'Over', 'Under.Equal'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times)
## Summary of sample sizes: 109, 109, 108, 110, 109, 110, ...
## Resampling results across tuning parameters:
##
## model winnow trials Accuracy Kappa
## rules FALSE 20 0.9292890 0.7585563
## rules FALSE 30 0.9292890 0.7585563
## rules FALSE 40 0.9292890 0.7585563
## rules TRUE 20 0.9280070 0.6870446
## rules TRUE 30 0.9280070 0.6870446
## rules TRUE 40 0.9280070 0.6870446
## tree FALSE 20 0.9327506 0.7786667
## tree FALSE 30 0.9359557 0.7880563
## tree FALSE 40 0.9359557 0.7880563
## tree TRUE 20 0.9280070 0.6870446
## tree TRUE 30 0.9280070 0.6870446
## tree TRUE 40 0.9280070 0.6870446
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were trials = 30, model = tree and winnow
## = FALSE.
## C5.0
##
## 121 samples
## 21 predictor
## 2 classes: 'Over', 'Under.Equal'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times)
## Summary of sample sizes: 109, 109, 108, 110, 109, 110, ...
## Resampling results across tuning parameters:
##
## model winnow trials Accuracy Kappa
## rules FALSE 1 0.9279604 0.6789588
## rules FALSE 10 0.9276224 0.7510563
## rules FALSE 20 0.9292890 0.7585563
## rules TRUE 1 0.9398019 0.7054988
## rules TRUE 10 0.9280070 0.6870446
## rules TRUE 20 0.9280070 0.6870446
## tree FALSE 1 0.9278089 0.6813726
## tree FALSE 10 0.9359557 0.7880563
## tree FALSE 20 0.9327506 0.7786667
## tree TRUE 1 0.9398019 0.7054988
## tree TRUE 10 0.9280070 0.6870446
## tree TRUE 20 0.9280070 0.6870446
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were trials = 1, model = rules and winnow
## = TRUE.
## Confusion Matrix and Statistics
##
## Actual
## Prediction Over Under.Equal
## Over 3 1
## Under.Equal 1 21
##
## Accuracy : 0.9231
## 95% CI : (0.7487, 0.9905)
## No Information Rate : 0.8462
## P-Value [Acc > NIR] : 0.214
##
## Kappa : 0.7045
##
## Mcnemar's Test P-Value : 1.000
##
## Sensitivity : 0.7500
## Specificity : 0.9545
## Pos Pred Value : 0.7500
## Neg Pred Value : 0.9545
## Prevalence : 0.1538
## Detection Rate : 0.1154
## Detection Prevalence : 0.1538
## Balanced Accuracy : 0.8523
##
## 'Positive' Class : Over
##
## Confusion Matrix and Statistics
##
## Actual
## Prediction Over Under.Equal
## Over 2 2
## Under.Equal 1 20
##
## Accuracy : 0.88
## 95% CI : (0.6878, 0.9745)
## No Information Rate : 0.88
## P-Value [Acc > NIR] : 0.6475
##
## Kappa : 0.5033
##
## Mcnemar's Test P-Value : 1.0000
##
## Sensitivity : 0.6667
## Specificity : 0.9091
## Pos Pred Value : 0.5000
## Neg Pred Value : 0.9524
## Prevalence : 0.1200
## Detection Rate : 0.0800
## Detection Prevalence : 0.1600
## Balanced Accuracy : 0.7879
##
## 'Positive' Class : Over
##
# Create combined target variable with (inverse of unemployment * median) categories
combined_target <- majors$Median * (1 - majors$Unemployment_rate) * majors$ShareWomen
majors_combined_target <- data.frame(majors, combined_target)
# view(majors_combined_target)
# Next let's one-hot encode those factor variables/character
majors_combined_target$combined_target <-ifelse(majors_combined_target$combined_target > 20000,1,0)
#added this a predictor versus replacing the numeric version
(majors_combined_target$combined_target <- cut(majors_combined_target$combined_target,c(-1,0.3953488,1),labels = c(0,1)))
## [1] 0 0 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0
## [38] 0 1 1 0 0 0 1 0 1 0 1 1 0 1 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0
## [75] 0 0 1 0 0 1 0 0 0 0 0 1 1 1 0 1 1 0 1 0 1 1 0 1 1 1 1 0 1 1 0 0 0 1 1 0 0
## [112] 0 1 0 1 1 0 1 1 1 0 0 0 1 0 1 1 1 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 0 1
## [149] 0 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
## Levels: 0 1
majors_combined_target$combined_target <- fct_collapse(majors_combined_target$combined_target, "LE.EQ.20K"="0", "G.50K"="1")
majors_combined_target <- majors_combined_target %>%
mutate(combined_target = factor(combined_target, labels = make.names(levels(combined_target))))
str(majors_combined_target)
## 'data.frame': 172 obs. of 21 variables:
## $ Total : int 2339 756 856 1258 32260 2573 3777 1792 91227 81527 ...
## $ Men : int 2057 679 725 1123 21239 2200 2110 832 80320 65511 ...
## $ Women : int 282 77 131 135 11021 373 1667 960 10907 16016 ...
## $ Major_category : Factor w/ 4 levels "Sciences","Arts",..: 4 4 4 4 4 4 3 1 4 4 ...
## $ ShareWomen : num 0.121 0.102 0.153 0.107 0.342 ...
## $ Sample_size : int 36 7 3 16 289 17 51 10 1029 631 ...
## $ Employed : int 1976 640 648 758 25694 1857 2912 1526 76442 61928 ...
## $ Full_time : int 1849 556 558 1069 23170 2038 2924 1085 71298 55450 ...
## $ Part_time : int 270 170 133 150 5180 264 296 553 13101 12695 ...
## $ Full_time_year_round: int 1207 388 340 692 16697 1449 2482 827 54639 41413 ...
## $ Unemployed : int 37 85 16 40 1672 400 308 33 4650 3895 ...
## $ Unemployment_rate : num 0.0184 0.1172 0.0241 0.0501 0.0611 ...
## $ Median : int 110000 75000 73000 70000 65000 65000 62000 62000 60000 60000 ...
## $ P25th : int 95000 55000 50000 43000 50000 50000 53000 31500 48000 45000 ...
## $ P75th : int 125000 90000 105000 80000 75000 102000 72000 109000 70000 72000 ...
## $ College_jobs : int 1534 350 456 529 18314 1142 1768 972 52844 45829 ...
## $ Non_college_jobs : int 364 257 176 102 4440 657 314 500 16384 10874 ...
## $ Low_wage_jobs : int 193 50 0 0 972 244 259 220 3253 3170 ...
## $ Over.50K : Factor w/ 2 levels "Over","Under.Equal": 1 1 1 1 1 1 1 1 1 1 ...
## $ High.Unemployment : Factor w/ 1 level "Low": 1 1 1 1 1 1 1 1 1 1 ...
## $ combined_target : Factor w/ 2 levels "LE.EQ.20K","G.50K": 1 1 1 1 2 1 2 2 1 1 ...
#Determine the baserate or prevalence for the classifier
(prevalence <- table(majors_combined_target$combined_target)[[2]]/length(majors_combined_target$combined_target))
## [1] 0.3953488
table(majors_combined_target$combined_target)
##
## LE.EQ.20K G.50K
## 104 68
# Split data into Train, Tune, Test
part_index_1 <- caret::createDataPartition(majors_combined_target$combined_target,
times=1,
p = 0.70,
groups=1,
list=FALSE)
train <- majors_combined_target[part_index_1, ]
tune_and_test <- majors_combined_target[-part_index_1, ]
#The we need to use the function again to create the tuning set
tune_and_test_index <- createDataPartition(tune_and_test$combined_target,
p = .5,
list = FALSE,
times = 1)
tune <- tune_and_test[tune_and_test_index, ]
test <- tune_and_test[-tune_and_test_index, ]
dim(train)
## [1] 121 21
dim(test)
## [1] 25 21
dim(tune)
## [1] 26 21
# these are slightly off because the data set isn't perfectly even
#Calculate the initial mtry level
mytry_tune <- function(x){
y <- dim(x)[2]-1
sqrt(y)
}
mytry_tune(majors_combined_target)
## [1] 4.472136
#Creating an initial random forest model with 500 trees
set.seed(2023)
combined_RF = randomForest(combined_target~., #<- Formula: response variable ~ predictors.
# The period means 'use all other variables in the data'.
train, #<- A data frame with the variables to be used.
#y = NULL, #<- A response vector. This is unnecessary because we're specifying a response formula.
#subset = NULL, #<- This is unnecessary because we're using all the rows in the training data set.
#xtest = NULL, #<- This is already defined in the formula by the ".".
#ytest = NULL, #<- This is already defined in the formula by "PREGNANT".
ntree = 500, #<- Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets classified at least a few times.
mtry = 4, #<- Number of variables randomly sampled as candidates at each split. Default number for classification is sqrt(# of variables). Default number for regression is (# of variables / 3).
replace = TRUE, #<- Should sampled data points be replaced.
#classwt = NULL, #<- Priors of the classes. Use this if you want to specify what proportion of the data SHOULD be in each class. This is relevant if your sample data is not completely representative of the actual population
#strata = NULL, #<- Not necessary for our purpose here.
sampsize = 100, #<- Size of sample to draw each time.
nodesize = 5, #<- Minimum numbers of data points in terminal nodes.
#maxnodes = NULL, #<- Limits the number of maximum splits.
importance = TRUE, #<- Should importance of predictors be assessed?
#localImp = FALSE, #<- Should casewise importance measure be computed? (Setting this to TRUE will override importance.)
proximity = FALSE, #<- Should a proximity measure between rows be calculated?
norm.votes = TRUE, #<- If TRUE (default), the final result of votes are expressed as fractions. If FALSE, raw vote counts are returned (useful for combining results from different runs).
do.trace = TRUE, #<- If set to TRUE, give a more verbose output as randomForest is run.
keep.forest = TRUE, #<- If set to FALSE, the forest will not be retained in the output object. If xtest is given, defaults to FALSE.
keep.inbag = TRUE) #<- Should an n by ntree matrix be returned that keeps track of which samples are in-bag in which trees?
## ntree OOB 1 2
## 1: 29.82% 36.67% 22.22%
## 2: 23.26% 25.49% 20.00%
## 3: 24.75% 23.73% 26.19%
## 4: 21.10% 19.05% 23.91%
## 5: 22.81% 16.18% 32.61%
## 6: 22.88% 18.31% 29.79%
## 7: 20.00% 15.07% 27.66%
## 8: 22.50% 19.18% 27.66%
## 9: 17.36% 10.96% 27.08%
## 10: 19.01% 12.33% 29.17%
## 11: 18.18% 9.59% 31.25%
## 12: 19.83% 10.96% 33.33%
## 13: 20.66% 12.33% 33.33%
## 14: 22.31% 10.96% 39.58%
## 15: 20.66% 10.96% 35.42%
## 16: 21.49% 10.96% 37.50%
## 17: 21.49% 12.33% 35.42%
## 18: 19.83% 12.33% 31.25%
## 19: 20.66% 13.70% 31.25%
## 20: 19.01% 13.70% 27.08%
## 21: 21.49% 15.07% 31.25%
## 22: 19.83% 13.70% 29.17%
## 23: 20.66% 13.70% 31.25%
## 24: 19.83% 13.70% 29.17%
## 25: 18.18% 12.33% 27.08%
## 26: 17.36% 10.96% 27.08%
## 27: 18.18% 12.33% 27.08%
## 28: 18.18% 12.33% 27.08%
## 29: 16.53% 10.96% 25.00%
## 30: 15.70% 10.96% 22.92%
## 31: 16.53% 12.33% 22.92%
## 32: 17.36% 12.33% 25.00%
## 33: 16.53% 10.96% 25.00%
## 34: 17.36% 13.70% 22.92%
## 35: 18.18% 13.70% 25.00%
## 36: 17.36% 10.96% 27.08%
## 37: 18.18% 12.33% 27.08%
## 38: 18.18% 12.33% 27.08%
## 39: 17.36% 12.33% 25.00%
## 40: 18.18% 13.70% 25.00%
## 41: 18.18% 13.70% 25.00%
## 42: 18.18% 13.70% 25.00%
## 43: 18.18% 13.70% 25.00%
## 44: 16.53% 12.33% 22.92%
## 45: 18.18% 12.33% 27.08%
## 46: 18.18% 13.70% 25.00%
## 47: 18.18% 13.70% 25.00%
## 48: 16.53% 12.33% 22.92%
## 49: 17.36% 13.70% 22.92%
## 50: 18.18% 13.70% 25.00%
## 51: 18.18% 13.70% 25.00%
## 52: 19.01% 13.70% 27.08%
## 53: 19.83% 13.70% 29.17%
## 54: 19.83% 13.70% 29.17%
## 55: 18.18% 13.70% 25.00%
## 56: 18.18% 13.70% 25.00%
## 57: 19.01% 13.70% 27.08%
## 58: 18.18% 13.70% 25.00%
## 59: 18.18% 13.70% 25.00%
## 60: 18.18% 13.70% 25.00%
## 61: 17.36% 13.70% 22.92%
## 62: 19.01% 13.70% 27.08%
## 63: 19.01% 13.70% 27.08%
## 64: 18.18% 13.70% 25.00%
## 65: 19.01% 15.07% 25.00%
## 66: 18.18% 13.70% 25.00%
## 67: 17.36% 12.33% 25.00%
## 68: 17.36% 12.33% 25.00%
## 69: 17.36% 12.33% 25.00%
## 70: 19.83% 13.70% 29.17%
## 71: 19.83% 12.33% 31.25%
## 72: 19.01% 12.33% 29.17%
## 73: 18.18% 10.96% 29.17%
## 74: 18.18% 10.96% 29.17%
## 75: 17.36% 9.59% 29.17%
## 76: 17.36% 9.59% 29.17%
## 77: 17.36% 9.59% 29.17%
## 78: 18.18% 10.96% 29.17%
## 79: 18.18% 10.96% 29.17%
## 80: 19.01% 10.96% 31.25%
## 81: 18.18% 10.96% 29.17%
## 82: 17.36% 9.59% 29.17%
## 83: 17.36% 9.59% 29.17%
## 84: 17.36% 9.59% 29.17%
## 85: 18.18% 10.96% 29.17%
## 86: 18.18% 10.96% 29.17%
## 87: 18.18% 10.96% 29.17%
## 88: 17.36% 10.96% 27.08%
## 89: 17.36% 10.96% 27.08%
## 90: 18.18% 12.33% 27.08%
## 91: 17.36% 10.96% 27.08%
## 92: 17.36% 10.96% 27.08%
## 93: 17.36% 10.96% 27.08%
## 94: 17.36% 10.96% 27.08%
## 95: 17.36% 10.96% 27.08%
## 96: 17.36% 10.96% 27.08%
## 97: 17.36% 10.96% 27.08%
## 98: 17.36% 10.96% 27.08%
## 99: 17.36% 10.96% 27.08%
## 100: 17.36% 10.96% 27.08%
## 101: 17.36% 10.96% 27.08%
## 102: 17.36% 10.96% 27.08%
## 103: 17.36% 10.96% 27.08%
## 104: 17.36% 10.96% 27.08%
## 105: 17.36% 10.96% 27.08%
## 106: 17.36% 10.96% 27.08%
## 107: 18.18% 10.96% 29.17%
## 108: 17.36% 10.96% 27.08%
## 109: 19.01% 12.33% 29.17%
## 110: 19.01% 12.33% 29.17%
## 111: 19.01% 12.33% 29.17%
## 112: 19.01% 12.33% 29.17%
## 113: 19.83% 12.33% 31.25%
## 114: 19.83% 12.33% 31.25%
## 115: 18.18% 10.96% 29.17%
## 116: 19.01% 12.33% 29.17%
## 117: 19.01% 12.33% 29.17%
## 118: 19.83% 12.33% 31.25%
## 119: 19.01% 12.33% 29.17%
## 120: 19.83% 12.33% 31.25%
## 121: 19.01% 12.33% 29.17%
## 122: 19.01% 12.33% 29.17%
## 123: 19.83% 13.70% 29.17%
## 124: 19.01% 12.33% 29.17%
## 125: 19.01% 12.33% 29.17%
## 126: 19.01% 12.33% 29.17%
## 127: 19.01% 12.33% 29.17%
## 128: 19.01% 12.33% 29.17%
## 129: 19.01% 12.33% 29.17%
## 130: 19.83% 13.70% 29.17%
## 131: 19.83% 13.70% 29.17%
## 132: 19.83% 13.70% 29.17%
## 133: 19.83% 13.70% 29.17%
## 134: 19.83% 13.70% 29.17%
## 135: 19.83% 13.70% 29.17%
## 136: 19.83% 13.70% 29.17%
## 137: 19.83% 13.70% 29.17%
## 138: 19.83% 13.70% 29.17%
## 139: 19.83% 13.70% 29.17%
## 140: 19.83% 13.70% 29.17%
## 141: 19.83% 13.70% 29.17%
## 142: 19.83% 13.70% 29.17%
## 143: 19.01% 12.33% 29.17%
## 144: 19.01% 12.33% 29.17%
## 145: 19.01% 12.33% 29.17%
## 146: 19.01% 12.33% 29.17%
## 147: 19.01% 12.33% 29.17%
## 148: 19.01% 12.33% 29.17%
## 149: 19.01% 12.33% 29.17%
## 150: 19.01% 12.33% 29.17%
## 151: 19.01% 12.33% 29.17%
## 152: 19.01% 12.33% 29.17%
## 153: 18.18% 10.96% 29.17%
## 154: 19.01% 12.33% 29.17%
## 155: 18.18% 10.96% 29.17%
## 156: 18.18% 10.96% 29.17%
## 157: 18.18% 10.96% 29.17%
## 158: 19.01% 12.33% 29.17%
## 159: 18.18% 10.96% 29.17%
## 160: 18.18% 10.96% 29.17%
## 161: 18.18% 10.96% 29.17%
## 162: 18.18% 10.96% 29.17%
## 163: 18.18% 10.96% 29.17%
## 164: 18.18% 10.96% 29.17%
## 165: 19.01% 10.96% 31.25%
## 166: 18.18% 10.96% 29.17%
## 167: 19.01% 10.96% 31.25%
## 168: 18.18% 10.96% 29.17%
## 169: 18.18% 10.96% 29.17%
## 170: 18.18% 10.96% 29.17%
## 171: 18.18% 10.96% 29.17%
## 172: 18.18% 10.96% 29.17%
## 173: 19.01% 10.96% 31.25%
## 174: 19.01% 10.96% 31.25%
## 175: 18.18% 10.96% 29.17%
## 176: 17.36% 9.59% 29.17%
## 177: 17.36% 9.59% 29.17%
## 178: 18.18% 9.59% 31.25%
## 179: 18.18% 9.59% 31.25%
## 180: 18.18% 9.59% 31.25%
## 181: 18.18% 9.59% 31.25%
## 182: 17.36% 9.59% 29.17%
## 183: 18.18% 9.59% 31.25%
## 184: 18.18% 9.59% 31.25%
## 185: 18.18% 9.59% 31.25%
## 186: 18.18% 9.59% 31.25%
## 187: 19.01% 10.96% 31.25%
## 188: 19.01% 10.96% 31.25%
## 189: 19.01% 10.96% 31.25%
## 190: 18.18% 9.59% 31.25%
## 191: 18.18% 9.59% 31.25%
## 192: 18.18% 9.59% 31.25%
## 193: 19.01% 10.96% 31.25%
## 194: 19.01% 10.96% 31.25%
## 195: 19.01% 10.96% 31.25%
## 196: 19.01% 10.96% 31.25%
## 197: 19.01% 10.96% 31.25%
## 198: 19.01% 10.96% 31.25%
## 199: 19.01% 10.96% 31.25%
## 200: 19.01% 10.96% 31.25%
## 201: 19.01% 10.96% 31.25%
## 202: 19.01% 10.96% 31.25%
## 203: 19.01% 10.96% 31.25%
## 204: 19.01% 10.96% 31.25%
## 205: 19.01% 10.96% 31.25%
## 206: 19.01% 10.96% 31.25%
## 207: 19.01% 10.96% 31.25%
## 208: 19.01% 10.96% 31.25%
## 209: 19.83% 10.96% 33.33%
## 210: 19.01% 10.96% 31.25%
## 211: 19.01% 10.96% 31.25%
## 212: 19.83% 10.96% 33.33%
## 213: 19.83% 10.96% 33.33%
## 214: 19.83% 10.96% 33.33%
## 215: 19.01% 10.96% 31.25%
## 216: 19.83% 10.96% 33.33%
## 217: 19.83% 10.96% 33.33%
## 218: 19.83% 10.96% 33.33%
## 219: 19.83% 10.96% 33.33%
## 220: 19.83% 10.96% 33.33%
## 221: 19.83% 10.96% 33.33%
## 222: 19.01% 9.59% 33.33%
## 223: 19.01% 9.59% 33.33%
## 224: 19.01% 9.59% 33.33%
## 225: 19.01% 9.59% 33.33%
## 226: 19.01% 9.59% 33.33%
## 227: 19.01% 9.59% 33.33%
## 228: 19.83% 10.96% 33.33%
## 229: 19.83% 10.96% 33.33%
## 230: 19.01% 9.59% 33.33%
## 231: 19.01% 9.59% 33.33%
## 232: 19.01% 9.59% 33.33%
## 233: 19.01% 9.59% 33.33%
## 234: 19.83% 10.96% 33.33%
## 235: 19.83% 10.96% 33.33%
## 236: 19.83% 10.96% 33.33%
## 237: 19.83% 10.96% 33.33%
## 238: 19.83% 10.96% 33.33%
## 239: 19.83% 10.96% 33.33%
## 240: 19.83% 10.96% 33.33%
## 241: 19.83% 10.96% 33.33%
## 242: 19.83% 10.96% 33.33%
## 243: 19.83% 10.96% 33.33%
## 244: 19.83% 10.96% 33.33%
## 245: 19.83% 10.96% 33.33%
## 246: 19.83% 10.96% 33.33%
## 247: 19.83% 10.96% 33.33%
## 248: 19.83% 10.96% 33.33%
## 249: 19.83% 10.96% 33.33%
## 250: 19.83% 10.96% 33.33%
## 251: 19.83% 10.96% 33.33%
## 252: 19.83% 10.96% 33.33%
## 253: 19.83% 10.96% 33.33%
## 254: 19.83% 10.96% 33.33%
## 255: 19.83% 10.96% 33.33%
## 256: 19.83% 10.96% 33.33%
## 257: 19.01% 10.96% 31.25%
## 258: 19.01% 10.96% 31.25%
## 259: 19.83% 10.96% 33.33%
## 260: 19.83% 10.96% 33.33%
## 261: 19.01% 10.96% 31.25%
## 262: 19.01% 10.96% 31.25%
## 263: 19.01% 10.96% 31.25%
## 264: 19.01% 10.96% 31.25%
## 265: 18.18% 10.96% 29.17%
## 266: 19.01% 10.96% 31.25%
## 267: 19.01% 10.96% 31.25%
## 268: 19.83% 10.96% 33.33%
## 269: 19.83% 10.96% 33.33%
## 270: 19.83% 10.96% 33.33%
## 271: 19.83% 10.96% 33.33%
## 272: 19.01% 10.96% 31.25%
## 273: 19.01% 10.96% 31.25%
## 274: 19.83% 10.96% 33.33%
## 275: 19.01% 10.96% 31.25%
## 276: 19.01% 10.96% 31.25%
## 277: 19.83% 10.96% 33.33%
## 278: 19.01% 10.96% 31.25%
## 279: 19.83% 10.96% 33.33%
## 280: 19.83% 10.96% 33.33%
## 281: 19.83% 10.96% 33.33%
## 282: 19.83% 10.96% 33.33%
## 283: 20.66% 12.33% 33.33%
## 284: 20.66% 12.33% 33.33%
## 285: 21.49% 12.33% 35.42%
## 286: 21.49% 12.33% 35.42%
## 287: 20.66% 10.96% 35.42%
## 288: 20.66% 10.96% 35.42%
## 289: 21.49% 12.33% 35.42%
## 290: 21.49% 12.33% 35.42%
## 291: 20.66% 10.96% 35.42%
## 292: 21.49% 12.33% 35.42%
## 293: 21.49% 12.33% 35.42%
## 294: 21.49% 12.33% 35.42%
## 295: 21.49% 12.33% 35.42%
## 296: 21.49% 12.33% 35.42%
## 297: 21.49% 12.33% 35.42%
## 298: 21.49% 12.33% 35.42%
## 299: 21.49% 12.33% 35.42%
## 300: 20.66% 10.96% 35.42%
## 301: 21.49% 12.33% 35.42%
## 302: 21.49% 12.33% 35.42%
## 303: 21.49% 12.33% 35.42%
## 304: 20.66% 10.96% 35.42%
## 305: 21.49% 12.33% 35.42%
## 306: 20.66% 10.96% 35.42%
## 307: 20.66% 10.96% 35.42%
## 308: 20.66% 10.96% 35.42%
## 309: 20.66% 10.96% 35.42%
## 310: 20.66% 10.96% 35.42%
## 311: 20.66% 10.96% 35.42%
## 312: 21.49% 12.33% 35.42%
## 313: 20.66% 10.96% 35.42%
## 314: 20.66% 10.96% 35.42%
## 315: 21.49% 12.33% 35.42%
## 316: 21.49% 12.33% 35.42%
## 317: 20.66% 10.96% 35.42%
## 318: 20.66% 10.96% 35.42%
## 319: 20.66% 10.96% 35.42%
## 320: 20.66% 10.96% 35.42%
## 321: 20.66% 10.96% 35.42%
## 322: 20.66% 10.96% 35.42%
## 323: 20.66% 10.96% 35.42%
## 324: 20.66% 10.96% 35.42%
## 325: 21.49% 12.33% 35.42%
## 326: 21.49% 12.33% 35.42%
## 327: 21.49% 12.33% 35.42%
## 328: 21.49% 12.33% 35.42%
## 329: 20.66% 10.96% 35.42%
## 330: 21.49% 12.33% 35.42%
## 331: 21.49% 12.33% 35.42%
## 332: 21.49% 12.33% 35.42%
## 333: 21.49% 12.33% 35.42%
## 334: 20.66% 10.96% 35.42%
## 335: 20.66% 10.96% 35.42%
## 336: 20.66% 10.96% 35.42%
## 337: 20.66% 10.96% 35.42%
## 338: 20.66% 10.96% 35.42%
## 339: 20.66% 10.96% 35.42%
## 340: 20.66% 10.96% 35.42%
## 341: 20.66% 10.96% 35.42%
## 342: 20.66% 10.96% 35.42%
## 343: 20.66% 10.96% 35.42%
## 344: 20.66% 10.96% 35.42%
## 345: 20.66% 10.96% 35.42%
## 346: 20.66% 10.96% 35.42%
## 347: 20.66% 10.96% 35.42%
## 348: 20.66% 10.96% 35.42%
## 349: 20.66% 10.96% 35.42%
## 350: 20.66% 10.96% 35.42%
## 351: 20.66% 10.96% 35.42%
## 352: 20.66% 10.96% 35.42%
## 353: 20.66% 10.96% 35.42%
## 354: 20.66% 10.96% 35.42%
## 355: 20.66% 10.96% 35.42%
## 356: 20.66% 10.96% 35.42%
## 357: 20.66% 10.96% 35.42%
## 358: 20.66% 10.96% 35.42%
## 359: 20.66% 10.96% 35.42%
## 360: 20.66% 10.96% 35.42%
## 361: 20.66% 10.96% 35.42%
## 362: 20.66% 10.96% 35.42%
## 363: 20.66% 10.96% 35.42%
## 364: 20.66% 10.96% 35.42%
## 365: 20.66% 10.96% 35.42%
## 366: 20.66% 10.96% 35.42%
## 367: 20.66% 10.96% 35.42%
## 368: 20.66% 10.96% 35.42%
## 369: 20.66% 10.96% 35.42%
## 370: 20.66% 10.96% 35.42%
## 371: 20.66% 10.96% 35.42%
## 372: 20.66% 10.96% 35.42%
## 373: 20.66% 10.96% 35.42%
## 374: 20.66% 10.96% 35.42%
## 375: 20.66% 10.96% 35.42%
## 376: 21.49% 12.33% 35.42%
## 377: 21.49% 12.33% 35.42%
## 378: 21.49% 12.33% 35.42%
## 379: 21.49% 12.33% 35.42%
## 380: 21.49% 12.33% 35.42%
## 381: 21.49% 12.33% 35.42%
## 382: 21.49% 12.33% 35.42%
## 383: 21.49% 12.33% 35.42%
## 384: 21.49% 12.33% 35.42%
## 385: 21.49% 12.33% 35.42%
## 386: 21.49% 12.33% 35.42%
## 387: 20.66% 10.96% 35.42%
## 388: 20.66% 10.96% 35.42%
## 389: 20.66% 10.96% 35.42%
## 390: 20.66% 10.96% 35.42%
## 391: 21.49% 12.33% 35.42%
## 392: 21.49% 12.33% 35.42%
## 393: 21.49% 12.33% 35.42%
## 394: 21.49% 12.33% 35.42%
## 395: 21.49% 12.33% 35.42%
## 396: 20.66% 10.96% 35.42%
## 397: 20.66% 10.96% 35.42%
## 398: 19.83% 10.96% 33.33%
## 399: 20.66% 12.33% 33.33%
## 400: 21.49% 12.33% 35.42%
## 401: 20.66% 10.96% 35.42%
## 402: 20.66% 12.33% 33.33%
## 403: 19.83% 10.96% 33.33%
## 404: 19.83% 10.96% 33.33%
## 405: 19.83% 10.96% 33.33%
## 406: 19.83% 10.96% 33.33%
## 407: 20.66% 12.33% 33.33%
## 408: 19.83% 10.96% 33.33%
## 409: 19.83% 12.33% 31.25%
## 410: 19.83% 10.96% 33.33%
## 411: 19.01% 10.96% 31.25%
## 412: 19.01% 10.96% 31.25%
## 413: 19.83% 10.96% 33.33%
## 414: 19.83% 10.96% 33.33%
## 415: 19.83% 10.96% 33.33%
## 416: 19.83% 10.96% 33.33%
## 417: 19.83% 12.33% 31.25%
## 418: 19.83% 10.96% 33.33%
## 419: 20.66% 12.33% 33.33%
## 420: 20.66% 12.33% 33.33%
## 421: 20.66% 12.33% 33.33%
## 422: 20.66% 12.33% 33.33%
## 423: 20.66% 12.33% 33.33%
## 424: 20.66% 12.33% 33.33%
## 425: 20.66% 12.33% 33.33%
## 426: 20.66% 12.33% 33.33%
## 427: 20.66% 12.33% 33.33%
## 428: 20.66% 12.33% 33.33%
## 429: 20.66% 12.33% 33.33%
## 430: 20.66% 12.33% 33.33%
## 431: 20.66% 12.33% 33.33%
## 432: 19.83% 12.33% 31.25%
## 433: 19.83% 12.33% 31.25%
## 434: 19.83% 12.33% 31.25%
## 435: 19.83% 12.33% 31.25%
## 436: 19.83% 12.33% 31.25%
## 437: 19.83% 12.33% 31.25%
## 438: 19.83% 12.33% 31.25%
## 439: 19.83% 12.33% 31.25%
## 440: 19.83% 12.33% 31.25%
## 441: 19.83% 12.33% 31.25%
## 442: 19.83% 12.33% 31.25%
## 443: 19.83% 12.33% 31.25%
## 444: 19.83% 12.33% 31.25%
## 445: 19.83% 12.33% 31.25%
## 446: 19.83% 12.33% 31.25%
## 447: 20.66% 13.70% 31.25%
## 448: 21.49% 13.70% 33.33%
## 449: 21.49% 13.70% 33.33%
## 450: 21.49% 13.70% 33.33%
## 451: 20.66% 12.33% 33.33%
## 452: 20.66% 12.33% 33.33%
## 453: 20.66% 12.33% 33.33%
## 454: 20.66% 12.33% 33.33%
## 455: 20.66% 12.33% 33.33%
## 456: 20.66% 12.33% 33.33%
## 457: 20.66% 12.33% 33.33%
## 458: 19.83% 12.33% 31.25%
## 459: 20.66% 12.33% 33.33%
## 460: 20.66% 12.33% 33.33%
## 461: 20.66% 12.33% 33.33%
## 462: 20.66% 12.33% 33.33%
## 463: 20.66% 12.33% 33.33%
## 464: 20.66% 12.33% 33.33%
## 465: 20.66% 12.33% 33.33%
## 466: 20.66% 12.33% 33.33%
## 467: 20.66% 12.33% 33.33%
## 468: 20.66% 12.33% 33.33%
## 469: 20.66% 12.33% 33.33%
## 470: 19.83% 12.33% 31.25%
## 471: 20.66% 12.33% 33.33%
## 472: 19.01% 10.96% 31.25%
## 473: 20.66% 12.33% 33.33%
## 474: 19.01% 10.96% 31.25%
## 475: 19.83% 12.33% 31.25%
## 476: 19.83% 12.33% 31.25%
## 477: 19.83% 12.33% 31.25%
## 478: 19.83% 12.33% 31.25%
## 479: 19.83% 12.33% 31.25%
## 480: 19.83% 12.33% 31.25%
## 481: 19.83% 12.33% 31.25%
## 482: 19.83% 12.33% 31.25%
## 483: 19.83% 12.33% 31.25%
## 484: 19.83% 12.33% 31.25%
## 485: 19.83% 12.33% 31.25%
## 486: 19.83% 12.33% 31.25%
## 487: 19.83% 12.33% 31.25%
## 488: 19.83% 12.33% 31.25%
## 489: 19.83% 12.33% 31.25%
## 490: 19.83% 12.33% 31.25%
## 491: 19.83% 12.33% 31.25%
## 492: 19.83% 12.33% 31.25%
## 493: 19.83% 12.33% 31.25%
## 494: 19.83% 12.33% 31.25%
## 495: 19.83% 12.33% 31.25%
## 496: 19.83% 12.33% 31.25%
## 497: 19.83% 12.33% 31.25%
## 498: 19.83% 12.33% 31.25%
## 499: 19.83% 12.33% 31.25%
## 500: 19.83% 12.33% 31.25%
# Look at the output of the random forest.
combined_RF
##
## Call:
## randomForest(formula = combined_target ~ ., data = train, ntree = 500, mtry = 4, replace = TRUE, sampsize = 100, nodesize = 5, importance = TRUE, proximity = FALSE, norm.votes = TRUE, do.trace = TRUE, keep.forest = TRUE, keep.inbag = TRUE)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 4
##
## OOB estimate of error rate: 19.83%
## Confusion matrix:
## LE.EQ.20K G.50K class.error
## LE.EQ.20K 64 9 0.1232877
## G.50K 15 33 0.3125000
# Determining the number of trees that should be used
# The "err.rate" argument includes a list of the cumulative error rates
# for each tree, by class and in aggregate for data points not
# included in the tree (OOB).
# View(as.data.frame(combined_RF$err.rate))
err.rate <- as.data.frame(combined_RF$err.rate)
# View(err.rate)
# The "oob.times" argument includes the number of times that each data point
# is not excluded from trees in the random forest.
# View(as.data.frame(combined_RF$oob.times))
combined_RF_error = data.frame(1:nrow(combined_RF$err.rate),
combined_RF$err.rate)
combined_RF_error
## X1.nrow.combined_RF.err.rate. OOB LE.EQ.20K G.50K
## 1 1 0.2982456 0.36666667 0.2222222
## 2 2 0.2325581 0.25490196 0.2000000
## 3 3 0.2475248 0.23728814 0.2619048
## 4 4 0.2110092 0.19047619 0.2391304
## 5 5 0.2280702 0.16176471 0.3260870
## 6 6 0.2288136 0.18309859 0.2978723
## 7 7 0.2000000 0.15068493 0.2765957
## 8 8 0.2250000 0.19178082 0.2765957
## 9 9 0.1735537 0.10958904 0.2708333
## 10 10 0.1900826 0.12328767 0.2916667
## 11 11 0.1818182 0.09589041 0.3125000
## 12 12 0.1983471 0.10958904 0.3333333
## 13 13 0.2066116 0.12328767 0.3333333
## 14 14 0.2231405 0.10958904 0.3958333
## 15 15 0.2066116 0.10958904 0.3541667
## 16 16 0.2148760 0.10958904 0.3750000
## 17 17 0.2148760 0.12328767 0.3541667
## 18 18 0.1983471 0.12328767 0.3125000
## 19 19 0.2066116 0.13698630 0.3125000
## 20 20 0.1900826 0.13698630 0.2708333
## 21 21 0.2148760 0.15068493 0.3125000
## 22 22 0.1983471 0.13698630 0.2916667
## 23 23 0.2066116 0.13698630 0.3125000
## 24 24 0.1983471 0.13698630 0.2916667
## 25 25 0.1818182 0.12328767 0.2708333
## 26 26 0.1735537 0.10958904 0.2708333
## 27 27 0.1818182 0.12328767 0.2708333
## 28 28 0.1818182 0.12328767 0.2708333
## 29 29 0.1652893 0.10958904 0.2500000
## 30 30 0.1570248 0.10958904 0.2291667
## 31 31 0.1652893 0.12328767 0.2291667
## 32 32 0.1735537 0.12328767 0.2500000
## 33 33 0.1652893 0.10958904 0.2500000
## 34 34 0.1735537 0.13698630 0.2291667
## 35 35 0.1818182 0.13698630 0.2500000
## 36 36 0.1735537 0.10958904 0.2708333
## 37 37 0.1818182 0.12328767 0.2708333
## 38 38 0.1818182 0.12328767 0.2708333
## 39 39 0.1735537 0.12328767 0.2500000
## 40 40 0.1818182 0.13698630 0.2500000
## 41 41 0.1818182 0.13698630 0.2500000
## 42 42 0.1818182 0.13698630 0.2500000
## 43 43 0.1818182 0.13698630 0.2500000
## 44 44 0.1652893 0.12328767 0.2291667
## 45 45 0.1818182 0.12328767 0.2708333
## 46 46 0.1818182 0.13698630 0.2500000
## 47 47 0.1818182 0.13698630 0.2500000
## 48 48 0.1652893 0.12328767 0.2291667
## 49 49 0.1735537 0.13698630 0.2291667
## 50 50 0.1818182 0.13698630 0.2500000
## 51 51 0.1818182 0.13698630 0.2500000
## 52 52 0.1900826 0.13698630 0.2708333
## 53 53 0.1983471 0.13698630 0.2916667
## 54 54 0.1983471 0.13698630 0.2916667
## 55 55 0.1818182 0.13698630 0.2500000
## 56 56 0.1818182 0.13698630 0.2500000
## 57 57 0.1900826 0.13698630 0.2708333
## 58 58 0.1818182 0.13698630 0.2500000
## 59 59 0.1818182 0.13698630 0.2500000
## 60 60 0.1818182 0.13698630 0.2500000
## 61 61 0.1735537 0.13698630 0.2291667
## 62 62 0.1900826 0.13698630 0.2708333
## 63 63 0.1900826 0.13698630 0.2708333
## 64 64 0.1818182 0.13698630 0.2500000
## 65 65 0.1900826 0.15068493 0.2500000
## 66 66 0.1818182 0.13698630 0.2500000
## 67 67 0.1735537 0.12328767 0.2500000
## 68 68 0.1735537 0.12328767 0.2500000
## 69 69 0.1735537 0.12328767 0.2500000
## 70 70 0.1983471 0.13698630 0.2916667
## 71 71 0.1983471 0.12328767 0.3125000
## 72 72 0.1900826 0.12328767 0.2916667
## 73 73 0.1818182 0.10958904 0.2916667
## 74 74 0.1818182 0.10958904 0.2916667
## 75 75 0.1735537 0.09589041 0.2916667
## 76 76 0.1735537 0.09589041 0.2916667
## 77 77 0.1735537 0.09589041 0.2916667
## 78 78 0.1818182 0.10958904 0.2916667
## 79 79 0.1818182 0.10958904 0.2916667
## 80 80 0.1900826 0.10958904 0.3125000
## 81 81 0.1818182 0.10958904 0.2916667
## 82 82 0.1735537 0.09589041 0.2916667
## 83 83 0.1735537 0.09589041 0.2916667
## 84 84 0.1735537 0.09589041 0.2916667
## 85 85 0.1818182 0.10958904 0.2916667
## 86 86 0.1818182 0.10958904 0.2916667
## 87 87 0.1818182 0.10958904 0.2916667
## 88 88 0.1735537 0.10958904 0.2708333
## 89 89 0.1735537 0.10958904 0.2708333
## 90 90 0.1818182 0.12328767 0.2708333
## 91 91 0.1735537 0.10958904 0.2708333
## 92 92 0.1735537 0.10958904 0.2708333
## 93 93 0.1735537 0.10958904 0.2708333
## 94 94 0.1735537 0.10958904 0.2708333
## 95 95 0.1735537 0.10958904 0.2708333
## 96 96 0.1735537 0.10958904 0.2708333
## 97 97 0.1735537 0.10958904 0.2708333
## 98 98 0.1735537 0.10958904 0.2708333
## 99 99 0.1735537 0.10958904 0.2708333
## 100 100 0.1735537 0.10958904 0.2708333
## 101 101 0.1735537 0.10958904 0.2708333
## 102 102 0.1735537 0.10958904 0.2708333
## 103 103 0.1735537 0.10958904 0.2708333
## 104 104 0.1735537 0.10958904 0.2708333
## 105 105 0.1735537 0.10958904 0.2708333
## 106 106 0.1735537 0.10958904 0.2708333
## 107 107 0.1818182 0.10958904 0.2916667
## 108 108 0.1735537 0.10958904 0.2708333
## 109 109 0.1900826 0.12328767 0.2916667
## 110 110 0.1900826 0.12328767 0.2916667
## 111 111 0.1900826 0.12328767 0.2916667
## 112 112 0.1900826 0.12328767 0.2916667
## 113 113 0.1983471 0.12328767 0.3125000
## 114 114 0.1983471 0.12328767 0.3125000
## 115 115 0.1818182 0.10958904 0.2916667
## 116 116 0.1900826 0.12328767 0.2916667
## 117 117 0.1900826 0.12328767 0.2916667
## 118 118 0.1983471 0.12328767 0.3125000
## 119 119 0.1900826 0.12328767 0.2916667
## 120 120 0.1983471 0.12328767 0.3125000
## 121 121 0.1900826 0.12328767 0.2916667
## 122 122 0.1900826 0.12328767 0.2916667
## 123 123 0.1983471 0.13698630 0.2916667
## 124 124 0.1900826 0.12328767 0.2916667
## 125 125 0.1900826 0.12328767 0.2916667
## 126 126 0.1900826 0.12328767 0.2916667
## 127 127 0.1900826 0.12328767 0.2916667
## 128 128 0.1900826 0.12328767 0.2916667
## 129 129 0.1900826 0.12328767 0.2916667
## 130 130 0.1983471 0.13698630 0.2916667
## 131 131 0.1983471 0.13698630 0.2916667
## 132 132 0.1983471 0.13698630 0.2916667
## 133 133 0.1983471 0.13698630 0.2916667
## 134 134 0.1983471 0.13698630 0.2916667
## 135 135 0.1983471 0.13698630 0.2916667
## 136 136 0.1983471 0.13698630 0.2916667
## 137 137 0.1983471 0.13698630 0.2916667
## 138 138 0.1983471 0.13698630 0.2916667
## 139 139 0.1983471 0.13698630 0.2916667
## 140 140 0.1983471 0.13698630 0.2916667
## 141 141 0.1983471 0.13698630 0.2916667
## 142 142 0.1983471 0.13698630 0.2916667
## 143 143 0.1900826 0.12328767 0.2916667
## 144 144 0.1900826 0.12328767 0.2916667
## 145 145 0.1900826 0.12328767 0.2916667
## 146 146 0.1900826 0.12328767 0.2916667
## 147 147 0.1900826 0.12328767 0.2916667
## 148 148 0.1900826 0.12328767 0.2916667
## 149 149 0.1900826 0.12328767 0.2916667
## 150 150 0.1900826 0.12328767 0.2916667
## 151 151 0.1900826 0.12328767 0.2916667
## 152 152 0.1900826 0.12328767 0.2916667
## 153 153 0.1818182 0.10958904 0.2916667
## 154 154 0.1900826 0.12328767 0.2916667
## 155 155 0.1818182 0.10958904 0.2916667
## 156 156 0.1818182 0.10958904 0.2916667
## 157 157 0.1818182 0.10958904 0.2916667
## 158 158 0.1900826 0.12328767 0.2916667
## 159 159 0.1818182 0.10958904 0.2916667
## 160 160 0.1818182 0.10958904 0.2916667
## 161 161 0.1818182 0.10958904 0.2916667
## 162 162 0.1818182 0.10958904 0.2916667
## 163 163 0.1818182 0.10958904 0.2916667
## 164 164 0.1818182 0.10958904 0.2916667
## 165 165 0.1900826 0.10958904 0.3125000
## 166 166 0.1818182 0.10958904 0.2916667
## 167 167 0.1900826 0.10958904 0.3125000
## 168 168 0.1818182 0.10958904 0.2916667
## 169 169 0.1818182 0.10958904 0.2916667
## 170 170 0.1818182 0.10958904 0.2916667
## 171 171 0.1818182 0.10958904 0.2916667
## 172 172 0.1818182 0.10958904 0.2916667
## 173 173 0.1900826 0.10958904 0.3125000
## 174 174 0.1900826 0.10958904 0.3125000
## 175 175 0.1818182 0.10958904 0.2916667
## 176 176 0.1735537 0.09589041 0.2916667
## 177 177 0.1735537 0.09589041 0.2916667
## 178 178 0.1818182 0.09589041 0.3125000
## 179 179 0.1818182 0.09589041 0.3125000
## 180 180 0.1818182 0.09589041 0.3125000
## 181 181 0.1818182 0.09589041 0.3125000
## 182 182 0.1735537 0.09589041 0.2916667
## 183 183 0.1818182 0.09589041 0.3125000
## 184 184 0.1818182 0.09589041 0.3125000
## 185 185 0.1818182 0.09589041 0.3125000
## 186 186 0.1818182 0.09589041 0.3125000
## 187 187 0.1900826 0.10958904 0.3125000
## 188 188 0.1900826 0.10958904 0.3125000
## 189 189 0.1900826 0.10958904 0.3125000
## 190 190 0.1818182 0.09589041 0.3125000
## 191 191 0.1818182 0.09589041 0.3125000
## 192 192 0.1818182 0.09589041 0.3125000
## 193 193 0.1900826 0.10958904 0.3125000
## 194 194 0.1900826 0.10958904 0.3125000
## 195 195 0.1900826 0.10958904 0.3125000
## 196 196 0.1900826 0.10958904 0.3125000
## 197 197 0.1900826 0.10958904 0.3125000
## 198 198 0.1900826 0.10958904 0.3125000
## 199 199 0.1900826 0.10958904 0.3125000
## 200 200 0.1900826 0.10958904 0.3125000
## 201 201 0.1900826 0.10958904 0.3125000
## 202 202 0.1900826 0.10958904 0.3125000
## 203 203 0.1900826 0.10958904 0.3125000
## 204 204 0.1900826 0.10958904 0.3125000
## 205 205 0.1900826 0.10958904 0.3125000
## 206 206 0.1900826 0.10958904 0.3125000
## 207 207 0.1900826 0.10958904 0.3125000
## 208 208 0.1900826 0.10958904 0.3125000
## 209 209 0.1983471 0.10958904 0.3333333
## 210 210 0.1900826 0.10958904 0.3125000
## 211 211 0.1900826 0.10958904 0.3125000
## 212 212 0.1983471 0.10958904 0.3333333
## 213 213 0.1983471 0.10958904 0.3333333
## 214 214 0.1983471 0.10958904 0.3333333
## 215 215 0.1900826 0.10958904 0.3125000
## 216 216 0.1983471 0.10958904 0.3333333
## 217 217 0.1983471 0.10958904 0.3333333
## 218 218 0.1983471 0.10958904 0.3333333
## 219 219 0.1983471 0.10958904 0.3333333
## 220 220 0.1983471 0.10958904 0.3333333
## 221 221 0.1983471 0.10958904 0.3333333
## 222 222 0.1900826 0.09589041 0.3333333
## 223 223 0.1900826 0.09589041 0.3333333
## 224 224 0.1900826 0.09589041 0.3333333
## 225 225 0.1900826 0.09589041 0.3333333
## 226 226 0.1900826 0.09589041 0.3333333
## 227 227 0.1900826 0.09589041 0.3333333
## 228 228 0.1983471 0.10958904 0.3333333
## 229 229 0.1983471 0.10958904 0.3333333
## 230 230 0.1900826 0.09589041 0.3333333
## 231 231 0.1900826 0.09589041 0.3333333
## 232 232 0.1900826 0.09589041 0.3333333
## 233 233 0.1900826 0.09589041 0.3333333
## 234 234 0.1983471 0.10958904 0.3333333
## 235 235 0.1983471 0.10958904 0.3333333
## 236 236 0.1983471 0.10958904 0.3333333
## 237 237 0.1983471 0.10958904 0.3333333
## 238 238 0.1983471 0.10958904 0.3333333
## 239 239 0.1983471 0.10958904 0.3333333
## 240 240 0.1983471 0.10958904 0.3333333
## 241 241 0.1983471 0.10958904 0.3333333
## 242 242 0.1983471 0.10958904 0.3333333
## 243 243 0.1983471 0.10958904 0.3333333
## 244 244 0.1983471 0.10958904 0.3333333
## 245 245 0.1983471 0.10958904 0.3333333
## 246 246 0.1983471 0.10958904 0.3333333
## 247 247 0.1983471 0.10958904 0.3333333
## 248 248 0.1983471 0.10958904 0.3333333
## 249 249 0.1983471 0.10958904 0.3333333
## 250 250 0.1983471 0.10958904 0.3333333
## 251 251 0.1983471 0.10958904 0.3333333
## 252 252 0.1983471 0.10958904 0.3333333
## 253 253 0.1983471 0.10958904 0.3333333
## 254 254 0.1983471 0.10958904 0.3333333
## 255 255 0.1983471 0.10958904 0.3333333
## 256 256 0.1983471 0.10958904 0.3333333
## 257 257 0.1900826 0.10958904 0.3125000
## 258 258 0.1900826 0.10958904 0.3125000
## 259 259 0.1983471 0.10958904 0.3333333
## 260 260 0.1983471 0.10958904 0.3333333
## 261 261 0.1900826 0.10958904 0.3125000
## 262 262 0.1900826 0.10958904 0.3125000
## 263 263 0.1900826 0.10958904 0.3125000
## 264 264 0.1900826 0.10958904 0.3125000
## 265 265 0.1818182 0.10958904 0.2916667
## 266 266 0.1900826 0.10958904 0.3125000
## 267 267 0.1900826 0.10958904 0.3125000
## 268 268 0.1983471 0.10958904 0.3333333
## 269 269 0.1983471 0.10958904 0.3333333
## 270 270 0.1983471 0.10958904 0.3333333
## 271 271 0.1983471 0.10958904 0.3333333
## 272 272 0.1900826 0.10958904 0.3125000
## 273 273 0.1900826 0.10958904 0.3125000
## 274 274 0.1983471 0.10958904 0.3333333
## 275 275 0.1900826 0.10958904 0.3125000
## 276 276 0.1900826 0.10958904 0.3125000
## 277 277 0.1983471 0.10958904 0.3333333
## 278 278 0.1900826 0.10958904 0.3125000
## 279 279 0.1983471 0.10958904 0.3333333
## 280 280 0.1983471 0.10958904 0.3333333
## 281 281 0.1983471 0.10958904 0.3333333
## 282 282 0.1983471 0.10958904 0.3333333
## 283 283 0.2066116 0.12328767 0.3333333
## 284 284 0.2066116 0.12328767 0.3333333
## 285 285 0.2148760 0.12328767 0.3541667
## 286 286 0.2148760 0.12328767 0.3541667
## 287 287 0.2066116 0.10958904 0.3541667
## 288 288 0.2066116 0.10958904 0.3541667
## 289 289 0.2148760 0.12328767 0.3541667
## 290 290 0.2148760 0.12328767 0.3541667
## 291 291 0.2066116 0.10958904 0.3541667
## 292 292 0.2148760 0.12328767 0.3541667
## 293 293 0.2148760 0.12328767 0.3541667
## 294 294 0.2148760 0.12328767 0.3541667
## 295 295 0.2148760 0.12328767 0.3541667
## 296 296 0.2148760 0.12328767 0.3541667
## 297 297 0.2148760 0.12328767 0.3541667
## 298 298 0.2148760 0.12328767 0.3541667
## 299 299 0.2148760 0.12328767 0.3541667
## 300 300 0.2066116 0.10958904 0.3541667
## 301 301 0.2148760 0.12328767 0.3541667
## 302 302 0.2148760 0.12328767 0.3541667
## 303 303 0.2148760 0.12328767 0.3541667
## 304 304 0.2066116 0.10958904 0.3541667
## 305 305 0.2148760 0.12328767 0.3541667
## 306 306 0.2066116 0.10958904 0.3541667
## 307 307 0.2066116 0.10958904 0.3541667
## 308 308 0.2066116 0.10958904 0.3541667
## 309 309 0.2066116 0.10958904 0.3541667
## 310 310 0.2066116 0.10958904 0.3541667
## 311 311 0.2066116 0.10958904 0.3541667
## 312 312 0.2148760 0.12328767 0.3541667
## 313 313 0.2066116 0.10958904 0.3541667
## 314 314 0.2066116 0.10958904 0.3541667
## 315 315 0.2148760 0.12328767 0.3541667
## 316 316 0.2148760 0.12328767 0.3541667
## 317 317 0.2066116 0.10958904 0.3541667
## 318 318 0.2066116 0.10958904 0.3541667
## 319 319 0.2066116 0.10958904 0.3541667
## 320 320 0.2066116 0.10958904 0.3541667
## 321 321 0.2066116 0.10958904 0.3541667
## 322 322 0.2066116 0.10958904 0.3541667
## 323 323 0.2066116 0.10958904 0.3541667
## 324 324 0.2066116 0.10958904 0.3541667
## 325 325 0.2148760 0.12328767 0.3541667
## 326 326 0.2148760 0.12328767 0.3541667
## 327 327 0.2148760 0.12328767 0.3541667
## 328 328 0.2148760 0.12328767 0.3541667
## 329 329 0.2066116 0.10958904 0.3541667
## 330 330 0.2148760 0.12328767 0.3541667
## 331 331 0.2148760 0.12328767 0.3541667
## 332 332 0.2148760 0.12328767 0.3541667
## 333 333 0.2148760 0.12328767 0.3541667
## 334 334 0.2066116 0.10958904 0.3541667
## 335 335 0.2066116 0.10958904 0.3541667
## 336 336 0.2066116 0.10958904 0.3541667
## 337 337 0.2066116 0.10958904 0.3541667
## 338 338 0.2066116 0.10958904 0.3541667
## 339 339 0.2066116 0.10958904 0.3541667
## 340 340 0.2066116 0.10958904 0.3541667
## 341 341 0.2066116 0.10958904 0.3541667
## 342 342 0.2066116 0.10958904 0.3541667
## 343 343 0.2066116 0.10958904 0.3541667
## 344 344 0.2066116 0.10958904 0.3541667
## 345 345 0.2066116 0.10958904 0.3541667
## 346 346 0.2066116 0.10958904 0.3541667
## 347 347 0.2066116 0.10958904 0.3541667
## 348 348 0.2066116 0.10958904 0.3541667
## 349 349 0.2066116 0.10958904 0.3541667
## 350 350 0.2066116 0.10958904 0.3541667
## 351 351 0.2066116 0.10958904 0.3541667
## 352 352 0.2066116 0.10958904 0.3541667
## 353 353 0.2066116 0.10958904 0.3541667
## 354 354 0.2066116 0.10958904 0.3541667
## 355 355 0.2066116 0.10958904 0.3541667
## 356 356 0.2066116 0.10958904 0.3541667
## 357 357 0.2066116 0.10958904 0.3541667
## 358 358 0.2066116 0.10958904 0.3541667
## 359 359 0.2066116 0.10958904 0.3541667
## 360 360 0.2066116 0.10958904 0.3541667
## 361 361 0.2066116 0.10958904 0.3541667
## 362 362 0.2066116 0.10958904 0.3541667
## 363 363 0.2066116 0.10958904 0.3541667
## 364 364 0.2066116 0.10958904 0.3541667
## 365 365 0.2066116 0.10958904 0.3541667
## 366 366 0.2066116 0.10958904 0.3541667
## 367 367 0.2066116 0.10958904 0.3541667
## 368 368 0.2066116 0.10958904 0.3541667
## 369 369 0.2066116 0.10958904 0.3541667
## 370 370 0.2066116 0.10958904 0.3541667
## 371 371 0.2066116 0.10958904 0.3541667
## 372 372 0.2066116 0.10958904 0.3541667
## 373 373 0.2066116 0.10958904 0.3541667
## 374 374 0.2066116 0.10958904 0.3541667
## 375 375 0.2066116 0.10958904 0.3541667
## 376 376 0.2148760 0.12328767 0.3541667
## 377 377 0.2148760 0.12328767 0.3541667
## 378 378 0.2148760 0.12328767 0.3541667
## 379 379 0.2148760 0.12328767 0.3541667
## 380 380 0.2148760 0.12328767 0.3541667
## 381 381 0.2148760 0.12328767 0.3541667
## 382 382 0.2148760 0.12328767 0.3541667
## 383 383 0.2148760 0.12328767 0.3541667
## 384 384 0.2148760 0.12328767 0.3541667
## 385 385 0.2148760 0.12328767 0.3541667
## 386 386 0.2148760 0.12328767 0.3541667
## 387 387 0.2066116 0.10958904 0.3541667
## 388 388 0.2066116 0.10958904 0.3541667
## 389 389 0.2066116 0.10958904 0.3541667
## 390 390 0.2066116 0.10958904 0.3541667
## 391 391 0.2148760 0.12328767 0.3541667
## 392 392 0.2148760 0.12328767 0.3541667
## 393 393 0.2148760 0.12328767 0.3541667
## 394 394 0.2148760 0.12328767 0.3541667
## 395 395 0.2148760 0.12328767 0.3541667
## 396 396 0.2066116 0.10958904 0.3541667
## 397 397 0.2066116 0.10958904 0.3541667
## 398 398 0.1983471 0.10958904 0.3333333
## 399 399 0.2066116 0.12328767 0.3333333
## 400 400 0.2148760 0.12328767 0.3541667
## 401 401 0.2066116 0.10958904 0.3541667
## 402 402 0.2066116 0.12328767 0.3333333
## 403 403 0.1983471 0.10958904 0.3333333
## 404 404 0.1983471 0.10958904 0.3333333
## 405 405 0.1983471 0.10958904 0.3333333
## 406 406 0.1983471 0.10958904 0.3333333
## 407 407 0.2066116 0.12328767 0.3333333
## 408 408 0.1983471 0.10958904 0.3333333
## 409 409 0.1983471 0.12328767 0.3125000
## 410 410 0.1983471 0.10958904 0.3333333
## 411 411 0.1900826 0.10958904 0.3125000
## 412 412 0.1900826 0.10958904 0.3125000
## 413 413 0.1983471 0.10958904 0.3333333
## 414 414 0.1983471 0.10958904 0.3333333
## 415 415 0.1983471 0.10958904 0.3333333
## 416 416 0.1983471 0.10958904 0.3333333
## 417 417 0.1983471 0.12328767 0.3125000
## 418 418 0.1983471 0.10958904 0.3333333
## 419 419 0.2066116 0.12328767 0.3333333
## 420 420 0.2066116 0.12328767 0.3333333
## 421 421 0.2066116 0.12328767 0.3333333
## 422 422 0.2066116 0.12328767 0.3333333
## 423 423 0.2066116 0.12328767 0.3333333
## 424 424 0.2066116 0.12328767 0.3333333
## 425 425 0.2066116 0.12328767 0.3333333
## 426 426 0.2066116 0.12328767 0.3333333
## 427 427 0.2066116 0.12328767 0.3333333
## 428 428 0.2066116 0.12328767 0.3333333
## 429 429 0.2066116 0.12328767 0.3333333
## 430 430 0.2066116 0.12328767 0.3333333
## 431 431 0.2066116 0.12328767 0.3333333
## 432 432 0.1983471 0.12328767 0.3125000
## 433 433 0.1983471 0.12328767 0.3125000
## 434 434 0.1983471 0.12328767 0.3125000
## 435 435 0.1983471 0.12328767 0.3125000
## 436 436 0.1983471 0.12328767 0.3125000
## 437 437 0.1983471 0.12328767 0.3125000
## 438 438 0.1983471 0.12328767 0.3125000
## 439 439 0.1983471 0.12328767 0.3125000
## 440 440 0.1983471 0.12328767 0.3125000
## 441 441 0.1983471 0.12328767 0.3125000
## 442 442 0.1983471 0.12328767 0.3125000
## 443 443 0.1983471 0.12328767 0.3125000
## 444 444 0.1983471 0.12328767 0.3125000
## 445 445 0.1983471 0.12328767 0.3125000
## 446 446 0.1983471 0.12328767 0.3125000
## 447 447 0.2066116 0.13698630 0.3125000
## 448 448 0.2148760 0.13698630 0.3333333
## 449 449 0.2148760 0.13698630 0.3333333
## 450 450 0.2148760 0.13698630 0.3333333
## 451 451 0.2066116 0.12328767 0.3333333
## 452 452 0.2066116 0.12328767 0.3333333
## 453 453 0.2066116 0.12328767 0.3333333
## 454 454 0.2066116 0.12328767 0.3333333
## 455 455 0.2066116 0.12328767 0.3333333
## 456 456 0.2066116 0.12328767 0.3333333
## 457 457 0.2066116 0.12328767 0.3333333
## 458 458 0.1983471 0.12328767 0.3125000
## 459 459 0.2066116 0.12328767 0.3333333
## 460 460 0.2066116 0.12328767 0.3333333
## 461 461 0.2066116 0.12328767 0.3333333
## 462 462 0.2066116 0.12328767 0.3333333
## 463 463 0.2066116 0.12328767 0.3333333
## 464 464 0.2066116 0.12328767 0.3333333
## 465 465 0.2066116 0.12328767 0.3333333
## 466 466 0.2066116 0.12328767 0.3333333
## 467 467 0.2066116 0.12328767 0.3333333
## 468 468 0.2066116 0.12328767 0.3333333
## 469 469 0.2066116 0.12328767 0.3333333
## 470 470 0.1983471 0.12328767 0.3125000
## 471 471 0.2066116 0.12328767 0.3333333
## 472 472 0.1900826 0.10958904 0.3125000
## 473 473 0.2066116 0.12328767 0.3333333
## 474 474 0.1900826 0.10958904 0.3125000
## 475 475 0.1983471 0.12328767 0.3125000
## 476 476 0.1983471 0.12328767 0.3125000
## 477 477 0.1983471 0.12328767 0.3125000
## 478 478 0.1983471 0.12328767 0.3125000
## 479 479 0.1983471 0.12328767 0.3125000
## 480 480 0.1983471 0.12328767 0.3125000
## 481 481 0.1983471 0.12328767 0.3125000
## 482 482 0.1983471 0.12328767 0.3125000
## 483 483 0.1983471 0.12328767 0.3125000
## 484 484 0.1983471 0.12328767 0.3125000
## 485 485 0.1983471 0.12328767 0.3125000
## 486 486 0.1983471 0.12328767 0.3125000
## 487 487 0.1983471 0.12328767 0.3125000
## 488 488 0.1983471 0.12328767 0.3125000
## 489 489 0.1983471 0.12328767 0.3125000
## 490 490 0.1983471 0.12328767 0.3125000
## 491 491 0.1983471 0.12328767 0.3125000
## 492 492 0.1983471 0.12328767 0.3125000
## 493 493 0.1983471 0.12328767 0.3125000
## 494 494 0.1983471 0.12328767 0.3125000
## 495 495 0.1983471 0.12328767 0.3125000
## 496 496 0.1983471 0.12328767 0.3125000
## 497 497 0.1983471 0.12328767 0.3125000
## 498 498 0.1983471 0.12328767 0.3125000
## 499 499 0.1983471 0.12328767 0.3125000
## 500 500 0.1983471 0.12328767 0.3125000
colnames(combined_RF_error) = c("Number of Trees", "Out of the Box","<=20K", ">20K")
combined_RF_error$Diff <- combined_RF_error$'>20K'-combined_RF_error$`<=20K`
# View(combined_RF_error)
# 54 Trees should be used because that amount is correlated to the minimum OOB error and >20K value.
#Determining the right number of variables to randomly sample (the mtry parameter)
str(train)
## 'data.frame': 121 obs. of 21 variables:
## $ Total : int 2339 756 1258 32260 3777 1792 91227 81527 15058 14955 ...
## $ Men : int 2057 679 1123 21239 2110 832 80320 65511 12953 8407 ...
## $ Women : int 282 77 135 11021 1667 960 10907 16016 2105 6548 ...
## $ Major_category : Factor w/ 4 levels "Sciences","Arts",..: 4 4 4 4 3 1 4 4 4 4 ...
## $ ShareWomen : num 0.121 0.102 0.107 0.342 0.441 ...
## $ Sample_size : int 36 7 16 289 51 10 1029 631 147 79 ...
## $ Employed : int 1976 640 758 25694 2912 1526 76442 61928 11391 10047 ...
## $ Full_time : int 1849 556 1069 23170 2924 1085 71298 55450 11106 9017 ...
## $ Part_time : int 270 170 150 5180 296 553 13101 12695 2724 2694 ...
## $ Full_time_year_round: int 1207 388 692 16697 2482 827 54639 41413 8790 5986 ...
## $ Unemployed : int 37 85 40 1672 308 33 4650 3895 794 1019 ...
## $ Unemployment_rate : num 0.0184 0.1172 0.0501 0.0611 0.0957 ...
## $ Median : int 110000 75000 70000 65000 62000 62000 60000 60000 60000 60000 ...
## $ P25th : int 95000 55000 43000 50000 53000 31500 48000 45000 42000 36000 ...
## $ P75th : int 125000 90000 80000 75000 72000 109000 70000 72000 70000 70000 ...
## $ College_jobs : int 1534 350 529 18314 1768 972 52844 45829 8184 6439 ...
## $ Non_college_jobs : int 364 257 102 4440 314 500 16384 10874 2425 2471 ...
## $ Low_wage_jobs : int 193 50 0 972 259 220 3253 3170 372 789 ...
## $ Over.50K : Factor w/ 2 levels "Over","Under.Equal": 1 1 1 1 1 1 1 1 1 1 ...
## $ High.Unemployment : Factor w/ 1 level "Low": 1 1 1 1 1 1 1 1 1 1 ...
## $ combined_target : Factor w/ 2 levels "LE.EQ.20K","G.50K": 1 1 1 2 2 2 1 1 1 2 ...
set.seed(2)
combined_RF_mtry = tuneRF(data.frame(train[ ,1:20]), #<- data frame of predictor variables
(train[ ,21]), #<- response vector (variables), factors for classification and continuous variable for regression
mtryStart = 4, #<- starting value of mtry, the default is the same as in the randomForest function
ntreeTry = 79, #<- number of trees used at the tuning step, let's use the same number as we did for the random forest
stepFactor = 2, #<- at each iteration, mtry is inflated (or deflated) by this value
improve = 0.05, #<- the improvement in OOB error must be by this much for the search to continue
trace = TRUE, #<- whether to print the progress of the search
plot = TRUE, #<- whether to plot the OOB error as a function of mtry
doBest = TRUE) #<- whether to create a random forest using the optimal mtry parameter
## mtry = 4 OOB error = 20.66%
## Searching left ...
## mtry = 2 OOB error = 19.01%
## 0.08 0.05
## mtry = 1 OOB error = 29.75%
## -0.5652174 0.05
## Searching right ...
## mtry = 8 OOB error = 14.88%
## 0.2173913 0.05
## mtry = 16 OOB error = 9.09%
## 0.3888889 0.05
## mtry = 20 OOB error = 11.57%
## -0.2727273 0.05
combined_RF_mtry
##
## Call:
## randomForest(x = x, y = y, mtry = res[which.min(res[, 2]), 1])
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 16
##
## OOB estimate of error rate: 12.4%
## Confusion matrix:
## LE.EQ.20K G.50K class.error
## LE.EQ.20K 65 8 0.1095890
## G.50K 7 41 0.1458333
#Based on the output of the combined_RF_mtry, it looks like 20 variables is the right number of variables to sample becauses it has the least OOB error compared to 2,4,8, and 16.
# Build Random Forest Classification Model for Combined Category in consideration of the number of trees, the number of variables to sample and the sample size that optimize the model output.
set.seed(2023)
combined_RF_2 = randomForest(combined_target~., #<- Formula: response variable ~ predictors.
# The period means 'use all other variables in the data'.
train, #<- A data frame with the variables to be used.
#y = NULL, #<- A response vector. This is unnecessary because we're specifying a response formula.
#subset = NULL, #<- This is unnecessary because we're using all the rows in the training data set.
#xtest = NULL, #<- This is already defined in the formula by the ".".
#ytest = NULL, #<- This is already defined in the formula by "PREGNANT".
ntree = 54, #<- Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets classified at least a few times.
mtry = 20, #<- Number of variables randomly sampled as candidates at each split. Default number for classification is sqrt(# of variables). Default number for regression is (# of variables / 3).
replace = TRUE, #<- Should sampled data points be replaced.
#classwt = NULL, #<- Priors of the classes. Use this if you want to specify what proportion of the data SHOULD be in each class. This is relevant if your sample data is not completely representative of the actual population
#strata = NULL, #<- Not necessary for our purpose here.
sampsize = 100, #<- Size of sample to draw each time.
nodesize = 5, #<- Minimum numbers of data points in terminal nodes.
#maxnodes = NULL, #<- Limits the number of maximum splits.
importance = TRUE, #<- Should importance of predictors be assessed?
#localImp = FALSE, #<- Should casewise importance measure be computed? (Setting this to TRUE will override importance.)
proximity = FALSE, #<- Should a proximity measure between rows be calculated?
norm.votes = TRUE, #<- If TRUE (default), the final result of votes are expressed as fractions. If FALSE, raw vote counts are returned (useful for combining results from different runs).
do.trace = TRUE, #<- If set to TRUE, give a more verbose output as randomForest is run.
keep.forest = TRUE, #<- If set to FALSE, the forest will not be retained in the output object. If xtest is given, defaults to FALSE.
keep.inbag = TRUE) #<- Should an n by ntree matrix be returned that keeps track of which samples are in-bag in which trees?
## ntree OOB 1 2
## 1: 26.32% 33.33% 18.52%
## 2: 26.44% 24.49% 28.95%
## 3: 26.47% 28.33% 23.81%
## 4: 22.12% 22.06% 22.22%
## 5: 21.37% 21.43% 21.28%
## 6: 20.17% 19.72% 20.83%
## 7: 21.85% 18.31% 27.08%
## 8: 18.49% 18.31% 18.75%
## 9: 19.83% 17.81% 22.92%
## 10: 20.66% 19.18% 22.92%
## 11: 19.01% 17.81% 20.83%
## 12: 17.36% 15.07% 20.83%
## 13: 19.83% 15.07% 27.08%
## 14: 18.18% 13.70% 25.00%
## 15: 17.36% 15.07% 20.83%
## 16: 15.70% 15.07% 16.67%
## 17: 13.22% 12.33% 14.58%
## 18: 14.05% 13.70% 14.58%
## 19: 16.53% 15.07% 18.75%
## 20: 14.05% 10.96% 18.75%
## 21: 14.88% 12.33% 18.75%
## 22: 17.36% 16.44% 18.75%
## 23: 15.70% 15.07% 16.67%
## 24: 15.70% 13.70% 18.75%
## 25: 15.70% 13.70% 18.75%
## 26: 15.70% 15.07% 16.67%
## 27: 17.36% 15.07% 20.83%
## 28: 16.53% 15.07% 18.75%
## 29: 16.53% 15.07% 18.75%
## 30: 15.70% 16.44% 14.58%
## 31: 16.53% 15.07% 18.75%
## 32: 17.36% 15.07% 20.83%
## 33: 16.53% 13.70% 20.83%
## 34: 14.88% 15.07% 14.58%
## 35: 15.70% 13.70% 18.75%
## 36: 17.36% 16.44% 18.75%
## 37: 18.18% 15.07% 22.92%
## 38: 18.18% 16.44% 20.83%
## 39: 18.18% 16.44% 20.83%
## 40: 16.53% 16.44% 16.67%
## 41: 16.53% 16.44% 16.67%
## 42: 14.88% 17.81% 10.42%
## 43: 14.05% 16.44% 10.42%
## 44: 14.88% 17.81% 10.42%
## 45: 15.70% 17.81% 12.50%
## 46: 14.88% 16.44% 12.50%
## 47: 14.88% 16.44% 12.50%
## 48: 14.88% 16.44% 12.50%
## 49: 15.70% 16.44% 14.58%
## 50: 16.53% 16.44% 16.67%
## 51: 16.53% 16.44% 16.67%
## 52: 17.36% 17.81% 16.67%
## 53: 16.53% 16.44% 16.67%
## 54: 16.53% 16.44% 16.67%
# Look at the output of the random forest.
combined_RF_2
##
## Call:
## randomForest(formula = combined_target ~ ., data = train, ntree = 54, mtry = 20, replace = TRUE, sampsize = 100, nodesize = 5, importance = TRUE, proximity = FALSE, norm.votes = TRUE, do.trace = TRUE, keep.forest = TRUE, keep.inbag = TRUE)
## Type of random forest: classification
## Number of trees: 54
## No. of variables tried at each split: 20
##
## OOB estimate of error rate: 16.53%
## Confusion matrix:
## LE.EQ.20K G.50K class.error
## LE.EQ.20K 61 12 0.1643836
## G.50K 8 40 0.1666667
#The sample size of the model was kept at the original value of 100, because it was found that this value minimized the class error as much as possible for both classes. When the sample size was increased or decreased, one of the class errors tends to fall, but the other rises significantly. Therefore, this is the best sample size that will minimize class errors, and prevent over or under fitting of the data.
Because the built in Random Forest Model was not agreeable with the tuning done with the caret library, an original random forest classification tuning metric was created in order to determine the best values for the three hyperparameters determined above.
# Tune the model
customRF <- list(type = "Classification", library = "randomForest", loop = NULL)
customRF$parameters <- data.frame(parameter = c("mtry", "ntree", "sampsize"), class = rep("numeric", 3), label = c("mtry", "ntree", "sampsize"))
customRF$grid <- function(x, y, len = NULL, search = "grid") {}
customRF$fit <- function(x, y, wts, param, lev, last, weights, classProbs, ...) {
randomForest(x, y, mtry = param$mtry, ntree=param$ntree, ...)
}
customRF$predict <- function(modelFit, newdata, preProc = NULL, submodels = NULL)
predict(modelFit, newdata)
customRF$prob <- function(modelFit, newdata, preProc = NULL, submodels = NULL)
predict(modelFit, newdata, type = "prob")
customRF$sort <- function(x) x[order(x[,1]),]
customRF$levels <- function(x) x$classes
Now, we can set the hyperparameter values to try and tune the model.
## .mtry .sampsize .ntree
## 1 3 50 200
## 2 4 50 200
## 3 5 50 200
## 4 3 100 200
## 5 4 100 200
## 6 5 100 200
## 7 3 200 200
## 8 4 200 200
## 9 5 200 200
## 10 3 50 300
## 11 4 50 300
## 12 5 50 300
## 13 3 100 300
## 14 4 100 300
## 15 5 100 300
## 16 3 200 300
## 17 4 200 300
## 18 5 200 300
## 19 3 50 400
## 20 4 50 400
## 21 5 50 400
## 22 3 100 400
## 23 4 100 400
## 24 5 100 400
## 25 3 200 400
## 26 4 200 400
## 27 5 200 400
## 121 samples
## 19 predictor
## 2 classes: 'Over', 'Under.Equal'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 5 times)
## Summary of sample sizes: 97, 97, 97, 96, 97, 97, ...
## Resampling results across tuning parameters:
##
## mtry sampsize ntree ROC Sens Spec
## 3 50 200 0.9903810 0.8533333 1.0000000
## 3 50 300 0.9910159 0.8300000 1.0000000
## 3 50 400 0.9910159 0.8266667 1.0000000
## 3 100 200 0.9913333 0.8500000 1.0000000
## 3 100 300 0.9871429 0.8266667 1.0000000
## 3 100 400 0.9910159 0.8300000 1.0000000
## 3 200 200 0.9910159 0.8300000 1.0000000
## 3 200 300 0.9897460 0.8400000 1.0000000
## 3 200 400 0.9903492 0.8400000 0.9980952
## 4 50 200 0.9913333 0.8633333 1.0000000
## 4 50 300 0.9916508 0.8666667 1.0000000
## 4 50 400 0.9910159 0.8533333 1.0000000
## 4 100 200 0.9909841 0.9000000 1.0000000
## 4 100 300 0.9897143 0.8666667 1.0000000
## 4 100 400 0.9916508 0.8766667 1.0000000
## 4 200 200 0.9906984 0.8766667 1.0000000
## 4 200 300 0.9925873 0.8666667 1.0000000
## 4 200 400 0.9929206 0.8533333 1.0000000
## 5 50 200 0.9916508 0.9233333 1.0000000
## 5 50 300 0.9910159 0.8966667 1.0000000
## 5 50 400 0.9922857 0.8966667 1.0000000
## 5 100 200 0.9903810 0.8966667 1.0000000
## 5 100 300 0.9916508 0.8866667 1.0000000
## 5 100 400 0.9916508 0.9100000 1.0000000
## 5 200 200 0.9922857 0.8866667 1.0000000
## 5 200 300 0.9910159 0.8866667 1.0000000
## 5 200 400 0.9916508 0.8633333 1.0000000
##
## ROC was used to select the optimal model using the largest value.
## The final values used for the model were mtry = 4, ntree = 400 and sampsize
## = 200.
# Evaluation of Model
What can you say about the results of the methods section as it relates to your question given the limitations to your model?
What additional analysis is needed or what limited your analysis on this project?